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Abstract —Data-aware methods for dimensionality reduction 
and matrix decomposition aim to find low-dimensional structure 
in a collection of data. Classical approaches discover such struc¬ 
ture hy learning a basis that can efficiently express the collection. 
Recently, “self expression”, the idea of using a small subset of 
data vectors to represent the full collection, has been developed 
as an alternative to learning. Here, we introduce a scalable 
method for computing sparse SEIf-Expressive Decompositions 
(SEED). SEED is a greedy method that constructs a basis by 
sequentially selecting incoherent vectors from the dataset. After 
forming a basis from a subset of vectors in the dataset, SEED 
then computes a sparse representation of the dataset with respect 
to this basis. We develop sufficient conditions under which SEED 
exactly represents low rank matrices and vectors sampled from 
a unions of independent subspaces. We show how SEED can be 
used in applications ranging from matrix approximation and 
denoislng to clustering, and apply it to numerous real-world 
datasets. Our results demonstrate that SEED is an attractive 
low-complexity alternative to other sparse matrix factorization 
approaches such as sparse PCA and self-expressive methods for 
clustering. 

Index Terms —Matrix factorization, subspace learning, column 
subset selection, matrix approximation, sparse recovery, subspace 
clustering. 

1. Introduction 

Data-driven methods for sparse matrix factorization, such 
as sparse PCA (SPCA) [1] and dictionary learning [2], [3], 
approximate data vectors as sparse linear combinations of a 
small set of basis elements. While these simple approaches 
provide an extremely efficient representation of a dataset, 
the bases learned often “mix” points from different low¬ 
dimensional geometric structures and thus lead to degraded 
classification/clustering performance [4]. 

An alternative approach for revealing low-dimensional 
structure is to let the data “express itself”—to represent each 
element in a dataset in terms of a small subset of samples. Self- 
expression has already been successfully used in the context of 
classification [5], clustering [ 6 ], [7], [ 8 ], and low-rank matrix 
approximation [4], [9]. 

In contrast to learning a basis, self expression provides a 
provable means by which low-dimensional subspace structures 
can be discovered [ 6 ], [10], [11], [ 8 ]. The idea underlying self- 
expressive approaches for clustering, such as sparse subspace 
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clustering (SSC) [10] and low-rank representations (LRR) 
[7], is to represent the dataset in terms of all of the signals 
in the collection. For a dataset X € containing N 

data vectors of m dimensions, one computes a representation 
X « XV, where V € is a sparse matrix with zeros 

along the diagonal. The resulting sparse matrix V can be 
interpreted as an affinity matrix, where data vectors that lie 
in the same subspace (or cluster) are presumed to use one 
another in their sparse representations. The dataset is then 
clustered by applying spectral clustering methods [ 12 ] to the 
graph Laplacian of A = |V| -f |V^|. 

While self-expressive methods like SSC and LRR provide a 
principled segmentation of X into low-dimensional subspaces 
(subspace clustering), applying these methods to big datasets 
is challenging. Both SSC and LRR require the construction 
and storage of an A x A affinity matrix for a dataset of size 
A. Even when low-complexity greedy methods are used to 
populate the affinity matrix, as in SSC-OMP [ 8 ], clustering 
the data requires solving an eigenvalue problem for the entire 
affinity matrix, which is intractable for large A. As such, 
the development of efficient solutions for decomposing large 
datasets is essential for both clustering and discovering low¬ 
dimensional structures in the data. 

A. Our contributions: SEED 

In this paper, we develop a scalable approach for sparse 
matrix factorization that is built upon the idea of using samples 
from the data to “express itself”. Our approach, which we 
refer to as a SElf-Expressive Decomposition (SEED), consists 
of two main steps. In the first step, we select data samples 
by sequentially selecting columns from X that are incoherent 
(uncorrelated) from columns selected at previous iterations. To 
do this, we use a method called oASIS (Accelerated Sequential 
Incoherence Selection) [13]. oASIS operates on a subset of the 
Gram matrix G = X^X and can thus be used to quickly select 
columns from X without computing the entire Gram matrix. 
In the second step, we use the vectors selected in the first step 
as a basis by which we compute a sparse representation of 
the dataset using a faster variant of the orthogonal matching 
pursuit (OMP) method [14]. We describe SEED in detail in 
Sec. Ill and provide pseudocode in Alg. 2. 

We demonstrate that SEED provides an effective strategy 
for matrix approximation, both in theory (Thm. 2) and in 
practice (Eig. 3). In particular, Thm. 2 provides a sufficient 
condition for oASIS to return a subset of columns X 5 that 
captures the full range of the data, i.e., X = XsXjX. 
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This condition, called exact matrix recovery, highlights two 
attractive properties of our proposed approach for column 
selection: (i) it naturally selects linearly independent columns 
and thus provides a highly efficient representation of low rank 
matrices, and (ii) it provides an estimate of the representation 
error remaining in the dataset after each iteration. This error 
estimate can be used to stop the algorithm when explicit 
evaluation of the error is intractable. 

Following our analysis, we demonstrate how SEED can be 
applied to aid in and/or solve numerous problems including: 
matrix approximation, clustering, denoising, and outlier de¬ 
tection (Sec. V). We evaluate the performance of SEED for 
these applications on several real-world datasets, including 
three image datasets and a collection of neural signals from 
the motor cortex. Our results demonstrate that SEED provides 
a scalable alternative to other sparse decomposition methods 
such as SPCA, SSC, and nearest neighbor (NN) methods at a 
fraction of the computational cost. 

B. Paper Organization 

This paper is organized as follows. In Sec. II we provide 
background on column subset selection, sparse recovery, and 
sparse subspace clustering. In Sec. Ill, we introduce SEED and 
then provide motivating examples and a complexity analysis. 
In Sec. IV, we develop a sufficient condition for exact matrix 
recovery with SEED for low rank matrices and datasets living 
on unions of independent subspaces. In Sec. V, we study the 
performance of SEED for four applications: matrix approxima¬ 
tion, (ii) denoising, (iii) sparse representation-based learning, 
and (iv) outlier detection. Einally, we end with concluding 
remarks (Sec. VI) and further details of our approach for 
column selection in Appendix A. 

C. Notation and Preliminaries 

We denote matrices X with uppercase bold script and 
vectors x with lowercase bold script. We write the (i, j) entry 
of a matrix X as X^. Let [A B] denote the column-wise con¬ 
catenation of A and B. Let X+ denote the left pseudoinverse 
of X. The orthogonal projection of X onto the span of the 
columns indexed by S is defined as 7r5(X) = (XsX^jX. 
The Erobenius norm is defined as ||X|j|, = The 

support of a vector x, supp(x), indexes its nonzero elements. 
The sparsity of v equals |supp(x)|. We denote the columns 
of X not indexed by the set S as X.g. We denote entry-wise 
multiplication of A and B by A o B and “colsum” returns 
the sums of the columns of its argument. 

We say that a collection of N signals of dimension 

m lie on a union of p subspaces in when 

each signal Xi £ U = where the dimension of 

the 2 *^ subspace Si equals ki < m. The matrix X contains 
independent subspaces when rank(X) = dim(/^) = ^1^=1 
and ki is the subspace dimension. If rank(X) < 
then the subspaces are overlapping. 

II. Background and Related Work 
A. Column Subset Selection 

Consider a dataset of N vectors in M™, each represented by 
a column of X € The task of identifying L columns 


that best represent the entire matrix X is referred to as column 
subset selection (CSS). The CSS problem is formulated as 
follows: 

(CSS) min ||X-7rs(X)|lF. 

|S|=L 

The CSS objective aims to find a set of L columns from X 
(indexed by the set S) that best approximate X in the least- 
squares sense. Eor a collection of signals that lie on a k- 
dimensional subspace, all invertible sub matrices X 5 G 
of X will yield exact matrix recovery, i.e., 

|lX-7rs(X)||^ = 0. 

Unfortunately, (CSS) is believed to be NP-hard, since it 
requires a brute force search for the sub-matrix of X that 
provides the best approximation. However, a large body 
of literature in random and adaptive column selection has 
emerged over the past few years [15], [9]. While uniform 
random sampling is the easiest and most well-studied sampling 
method, a number of adaptive selection criteria have been 
proposed to reduce the number of samples required to achieve 
a target approximation error. Adaptive approaches include: (i) 
leverage-based sampling [4], [16] and (ii) sequential error- 
based selection (SES) approaches [9]. Leverage sampling 
requires computing a low-rank SVD of the data matrix to de¬ 
termine which columns exert the most influence over its low- 
rank approximation. After computing the so-called “leverage 
scores” for the columns of the dataset, columns are drawn 
randomly based upon their leverage score. SES strategies 
select columns based upon how well they are approximated 
by the current sample set [9]: the probability of selecting the 
2 *^ column Xi is proportional to p{i) oc Ijx^ — 7 r 5 (xi)|| 2 . While 
SES strategies are highly effective in practice, these methods 
are very costly, because they require computing a m x N 
residual error matrix at each selection step. In contrast, the 
proposed column selection strategy (oASIS) requires comput¬ 
ing and operating on only a k x N matrix at each step, where 
k is the iteration number and k < L < m. 

B. Convex Approach for Subset Selection 

Recently, a convex approach was proposed to find repre¬ 
sentative columns from X [17]. This self-expressive method 
selects representative columns from X by solving: 

min IIVIL 1 subject to X = XV, 

veR«xN II " ’ 

where || V|| 2 ,i is the sum of the f 2 -norms of the rows of V. By 
penalizing the rows of V in this way, we minimize the number 
of non-zero rows in V which in turn minimizes the number of 
columns of X needed to represent the dataset. This approach 
is known to reveal representative columns from collections of 
data [17], [18] and also aid in hyperspectral unmixing [19]. 
However, this approach requires solving for A x A matrix 
and cannot be used directly to enforce sparsity in the entries 
of V as in SEED because group sparsity norms are known to 
produce dense estimates within each group. 
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Algorithm 1 ; Orthogonal Matching Pursuit (OMP) 
Input: Input signal x, dictionary D containing L unit-norm 
vectors in its columns, termination condition (either the 
target sparsity k or approximation error e). 

Output: A sparse coefficient vector v. 

Initialize: Set the residual r to the input signal r = x. 

1. Select the column of D that is maximally correlated with 
r and add it to A 

A^A U arg max |(d,-,r)|. 

j=l,...,N ■' 

2. Update the residual r = x — DaD^x. 

3. Repeat steps (1-2) until the norm of the residual ||r|j < e 
or |A| = k. 

4. Return the sparse coefficient vector v, with nonzero 
coefficients va = Da^ and v(i) = 0, Vi ^ A. 


C. Greedy Sparse Recovery 

Sparse recovery methods aim to form an approximation 
X = Dv consisting of a small number of nonzero coefficients 
in V. Greedy methods for sparse recovery, such as orthogonal 
matching pursuit (OMP) [20], select columns from D (atoms) 
iteratively, subtracting the contribution of each selected atom 
from the current signal residual. This selection process is 
then repeated until a stopping criterion is satisfied: either a 
target sparsity ||x||o = fc is reached (Sparse), or the residual 
magnitude becomes smaller than a pre-specified value (Error). 
Pseudocode for the OMP algorithm is given in Alg. 1. 

D. Sparse Subspace Clustering 

A number of methods for learning multiple subspaces from 
data (subspace clustering) use the idea of self-expression to 
represent the data in terms of other signals in the collection; 
such methods lead to state-of-the-art clustering performance 
for unions of subspaces [10], [7]. For instance, sparse subspace 
clustering (SSC) [10] factorizes the dataset X G by 

solving the following fi-minimization problem: 

min ||V||i s.t. diag(V) = 0, IIX - XV||f < e, 
veK«x« 

where e is a user set parameter which controls the error in the 
self-expressive approximation. The idea underlying SSC is that 
that each datapoint can be represented as a linear combination 
of a small number of points in the dataset. 

The coefficient matrix V computed via SSC can be in¬ 
terpreted as a graph, where the (i,j) entry of the matrix 
represents the edge between the and point in the 
dataset; the strength of each edge represents the likelihood 
that two points live in the same subspace. After forming a 
symmetric affinity matrix A = |V| -f |V^|, spectral clustering 
is then performed on the graph Laplacian of the affinity matrix 
to obtain labels (indicating the subspace membership) for all 
the points in the dataset [ 12 ]. 

The motivation underlying SSC is that the sparse represen¬ 
tation of a signal under consideration will consist of other 
signals from the same subspace. In fact, the SSC procedure 


Algorithm 2 : Sparse Self-Expressive Decomposition (SEED) 
Input: A dataset X € the maximum number of 

columns to select L, termination criterion for Step 1 S, 
a termination criterion for Step 2 (either target sparsity k 
and/or approximation error e). 

Output: A normalized basis D G and sparse 

coefficient matrix V G R-^^-^. 

Step I. Column Subset Selection: Select L columns via 
oASIS and normalize the selected columns to form D G 

j^MxL 

Step II. Greedy Sparse Recovery: Solve OMP for each 
column of X with respect to D and stack the result into the 
corresponding column of V e 


leads to provable guarantees of exact feature selection (EES) 
— a condition in which every data point is represented using 
only data from within its own subspace [ 10 ], [ 11 ], [ 8 ], [ 21 ]. 
Guarantees for EES require that there exists at least k linearly 
independent columns in X that span each /c-dimensional 
subspace in the dataset. When this occurs, we say that X 
provides a complete reference set for a subspace. In Sec. 
IV-C, we show that when X lies on a union of independent 
subspaces, SEED can be guaranteed to return a set of columns 
X 5 that provides a complete reference set for all of the 
subspaces present in the dataset. 

III. Sparse Self-Expressive Decomposition (SEED) 

In this Section, we provide a description, pseudocode (Alg. 
2), and complexity analysis of SEED. 

A. SEED Method 

As we described in the Introduction, SEED aims to form an 
approximation X ss DV such that V is sparse and D contains 
as few normalized columns of X as necessary to represent 
the data. Our solution consists of two steps, which we now 
describe in detail. 

1) Step 1: Column Selection: In Step 1 of SEED, we 
select columns from X that form a good low-dimensional 
approximation to the dataset. To do this, we employ a 
method called Accelerated Sequential Incoherence Selection 
(oASIS) [13], an adaptive strategy that selects columns that 
are incoherent (uncorrelated) from one another. oASIS was 
originally designed to compute low rank factorizations of 
positive semidefinite kernel matrices used in a wide range of 
machine learning applications. Here, we show how oASIS can 
be used in a novel way, for column selection, by finding a low 
rank approximation to the Gram matrix G = X^X. 

As a motivating example, we show the samples (images) 
selected via oASIS and random sampling for a dataset con¬ 
sisting of faces under various illumination conditions (Fig. 1). 
We observe that oASIS returns a set of images that are highly 
varied in terms of their illumination (left) because we select 
columns that are incoherent from those selected at previous 
iterations. In contrast, random sampling (right) returns a set 
of images with highly redundant illumination conditions and 
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OASIS Random 



Fig. 1. Incoherence sampling from face images. Face images from one 
subject selected via oASIS (left) and random sampling (right). The images 
selected via oASIS represent a wide range of diverse illumination conditions 
whereas random sampling selects a number of similar (redundant) illumination 
conditions. 


is thus less efficient at expressing the range of the dataset. We 
provide details (Appendix A) and pseudocode (Alg. 3) for our 
implementation of oASIS. 

To motivate the selection criterion used in oASIS, suppose 
that we have already selected k columns from X, indexed 
by the set S. Without loss of generality, we reorder X = 
[Xs X,s]. Let W = X;^Xs and C = X^Xg. The least- 
squares approximation of G in terms of the subsampled 
columns C (i.e., the Nystrom approximation [22]) is given 
by 

Gfc = CW+C"^. 

We assume that X 5 contains linearly independent columns and 
thus inversion of W is possible. In Sec. IV-A, we will show 
that oASIS is guaranteed to only select linearly independent 
columns and thus this assumption is justified. 

Now, we would like to determine which column to add to 
our current approximation G^ (Fig. 2). If we consider a new 
column Cj = X^Xj, the resulting upper left (A:-|-l) by (fc-fl) 
block of the approximation can be written as 

bf di 

_ X^Xs b, 

hf bfW-ibJ ’ 

where b^ = X^x*, = xfx*, and X^+i = [Xs x^]. 

If a new column hi lies in the span of W, then the 
projection bf W“^b = d;. However, the discrepancy between 
these two scalar quantities provides a measure of how poorly 
W represents a candidate column c^. Thus, without computing 
the entire column and measuring its projection onto the span 
W, we can instead approximate the influence that will have 
on the current approximation by measuring the discrepancy 
between our estimate bf W^^b, and d^. Using this insight, we 
employ the following greedy strategy to decide which column 
to add to our approximation at the k + 1 iteration; 

1) Permute the rows and columns of G such that the first k 
columns correspond to the columns that we have already 
sampled and form W^. 

2) Let bi denote the first k entries of column i, and let 
di denote the diagonal entry in this column. For each 
unsampled column, calculate the lower bound on 
how much it changes the Nystrom approximation; 

A, = d,-bfW+b,. 


Gfc+i — 


Xj,_|_]^Xfc+i bi 

hj d, 




0 


0 

0 



Fig. 2. Column selection with oASIS. At each step of oASIS, we project 
b (in green) onto the span of W (upper left red block) and select the 
column that produces the largest deviation when compared its corresponding 
diagonal entry d, i.e., we select the column that has maximal deviation 
A = |d — b^W~^b|. Our proposed column selection strategy only depends 
on knowledge of the red shaded regions (a subset of columns/rows of G) and 
does not depend on the gray shaded region containing the inner products 
between X and the unsampled columns X_5 in question. 


3) Select the unsampled column with maximum [A^j and 
add it to S. 

4) If the selected value of [A^j is smaller than a user set 
threshold, then terminate. Otherwise, let fc ^ fc-fl, and 
return to Step 1. 

At each iteration, one needs to compute the coefficient 
matrix (represented by above). We can speed up the 

naive implementation above by performing rank -1 updates 
to the coefficient matrix every time a column is added. For 
the sake of completeness, we provide a brief description and 
pseudocode for oASIS in the Appendix. See [13] for a full 
discussion of oASIS and its application to low rank kernel 
matrix approximation. 

2) Step 2: Greedy Sparse Recovery: In Step 2 of SEED, 
we compute the sparse representations of the columns of X in 
terms of the set of columns selected in Step 1, given by X 5 . To 
do this, we first normalize all the columns in Xg to have unit 
.f 2 -norm; let D S denote the corresponding matrix of 

normalized datapoints. Without loss of generality, we reorder 
X = [Xs X_s] and compute its sparse decomposition as 
X = DV, where V = [diag(Q;) W] and a. G is a vector 
containing the .f 2 -iiorm of the column in Xg in its entry. 
The columns of W are computed by solving an accelerated 
version of OMP designed to efficiently compute the sparse 
representations of a batch of signals, called batch orthogonal 
matching pursuit (OMP) [14]. Unlike convex optimization- 
based approaches for sparse recovery that use the fi-norm 
[23], one can constrain either the total approximation error 
for each column (Error) or constrain the sparsity (Sparse). 

B. Variant of Alg. 2 

We now introduce a variant of SEED that can be used for 
outlier detection and clustering applications. This variant mod¬ 
ifies the way in which we compute the sparse representation 
of the sampled signals X 5 . If we reorder X = [X 5 X.g], 
we can write the sparse matrix V = [V 5 V. 5 ], where V.s 
contains the sparse representations of the unsampled signals 
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in its columns. Rather than simply setting V 5 to be a diagonal 
matrix containing the norm of the sampled signals (as in Alg. 
2 ), we can set V 5 to be the solution of the following objective 
function: 

min IjWlio s.t. |!Xs-DW||f < e, diag(W) = 0. (1) 

The idea behind this variant of SEED is to form a sparse 
representation of each sampled column in terms of other 
selected columns. 


C. Complexity of SEED 

The runtime complexity required to select L columns in 
Step 1 of SEED (oASIS) is 0{NL‘^), requiring storage of a 
t X N matrix at step t. In Step 2 of SEED, we must compute 
a sparse approximation of each datapoint, where the runtime 
complexity of OMP for a m x L matrix is T « + kmL. 

Thus the complexity of computing V equals N{k^ + kmL), 
which for small k<L<m^N, is roughly 0{NLf). Thus, 
the total complexity of both steps is given by 0{NLf). 

To contrast the complexity of SEED with other approaches 
for clustering, the complexity of computing k nearest neigh¬ 
bors for a collection of N data points is 0{N‘^ log(A^)). The 
complexity of SSC-OMP [ 8 ] is dominated by forming a sparse 
representation of each column of X with respect to a to x 1 

matrix which has 0{N‘^) runtime complexity. 

IV. Results for Exact Matrix Recovery 

In this Section, we develop sufficient conditions for exact 
matrix recovery: this occurs when the projection of X onto 
the subspace spanned by the subset Xg gives us back exactly 
X, i.e., X = 7 rs(X). To prove our main result for exact 
matrix recovery (Thm. 2), we begin by first proving that at 
each iteration of oASIS, the algorithm selects samples that are 
linearly independent from those selected at previous iterations 
(Lem. 1). We then show how the application of oASIS to 
the Gram matrix of X is guaranteed to provide exact matrix 
recovery. 


A. Independent Selection Property of oASIS 

If we assume that X is rank r, then exact matrix recovery 
occurs when X 5 contains at least r linearly independent 
columns from X. In Lemma 1 below, we provide a sufficient 
condition that describes when oASIS will return a set of r 
linearly independent columns. 

Lemma 1: At each step of oASIS, the column of the 
Gram matrix G is linearly independent from the previously 
selected columns provided that A(i) > 0. 

Proof. We proceed by induction. Let Gg denote the set of 
columns from G = X^X already selected at the previous 
iterations and let W^. = X^Xg denote the square matrix 
consisting of the entries of G at the selected row and column 
indices after k columns have been selected. Assume that Wfc 
is invertible since S consists of linearly independent columns 
from G. Now, consider selecting the column of G and 
forming a new W^+i given by 


Wfc+i 


Wfc bfc_|_i 
bfc+i dfc+i 


where bfc+i = X^x^ is a column vector corresponding to the 
inner products between the newly selected column and the 
previously selected columns (indexed by S) and dfc+i = x^x^ 
is equal to Gu. This matrix is invertible provided the Schur 
complement of Wfc+i is non-zero. The Schur complement 
is dfc+i - b^^;^W^^bfc+i = Afc+i(i). Thus, if Afc+i(i) is 
nonzero, then Wfc+i contains linearly independent columns, 
and thus the column of G from which Wfc+i is drawn must 
also be linearly independent. As long as we initialize oASIS 
with a set of linearly independent columns, it is guaranteed to 
select linearly independent columns provided that Afc+i (z) > 
0 for all z corresponding to unselected columns. Our result 
follows by induction. □ 

Remark. Lemma 1 guarantees that oASIS will return a set 
of r linearly independent columns in r steps as long as the 
selection criterion A(z) 7 ^ 0 holds before exact reconstruction 
occurs. Unfortunately, in the pathological case in which the 
algorithm fails with A(z) = 0 before r columns have been se¬ 
lected, the algorithm may terminate early. While it is possible 
to construct pathological matrices where this occurs, we have 
not observed this early termination in practice. The following 
theorem shows that when the entries of the Gram matrix are 
drawn from a continuous random distribution, the algorithm 
succeeds with probability 1 . 

Theorem 1: Suppose that the entries of the Gram matrix 
G are drawn from a continuous random distribution. Assume 
that oASIS is initialized by randomly selecting fewer than 
r columns from X. Then oASIS succeeds in generating r 
linearly independent columns with probability 1 . 

Proof. We begin by noting that the randomly chosen ini¬ 
tialization columns of the Gram matrix G have full rank with 
probability 1. This is because the matrix G is a random matrix 
drawn from a continuous distribution, and the set of singular 
matrices has positive co-dimension and thus measure 0. The 
probability of choosing a set of linearly dependent vectors by 
chance is thus zero. 

Suppose now that k—1 columns have already been selected, 
where k — 1 < r. We wish to show that it is always possible 
to select column k. The result then follows by induction. 

Observe that the algorithm fails to choose the k^^ vector 
only if 

= Vze{l,2,... ,iV}, (2) 

where denotes the diagonal entry in column z. By con¬ 
struction, condition ( 2 ) holds for z € { 1 , 2 , • • • , fc — 1} (the 
k — 1 approximation used by oASIS on iteration k perfectly 
represents these columns). Eor columns k through r, the 
quantity bf is known (i.e., it is not a random variable 

because it can be computed using only values of the sampled 
columns 1 through k — 1). However, for such columns, d^ is 
continuous random variable, and thus the probability of ( 2 ) 
holding for z > fc is 0 . □ 


B. Exact Matrix Recovery 

Using Lemma 1, we now prove that oASIS returns a sample 
subset that yields exact matrix recovery. To do this, we use 
the fact that the Gram matrix G = X^X from which oASIS 
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selects columns spans the same space as X. Thus, when 
we select r linearly independent columns from G, this also 
guarantees that the same set of columns from X are linearly 
independent. This idea is made precise in the following. 

Lemma 2: Let X be a rank r matrix and Gg be the set 
of columns selected via oASIS from the corresponding Gram 
matrix. Exact matrix recovery of X occurs when rank(Gs) = 
r. 

Proof. Recall that the Gram matrix G = X^X is by defini¬ 
tion of the same rank as X. The columns of G indexed by S 
equals Gs = X^Xg and the rank(Gs) = rank(X^X 5 ) = 
rank(X 5 ). Thus, when rank(G 5 ') = r, this implies that 
rank(X 5 ) = r, which by assumption equals the rank of the 
full dataset X. Therefore, when rank(G 5 ) = r, exact recovery 
of X is guaranteed. □ 

To state our main result for matrix recovery with oASIS, 
we must make the following assumption. 

Assumption 1: For a rank r matrix, oASIS returns a set of 
r columns before terminating with max^ = A(z) = 0. 

We are now equipped to state our main result. 

Theorem 2 (Exact Recovery Condition): Let X be a rank r 
matrix. Exact matrix recovery occurs as long as Assumption 
1 is satisfied. 

Proof. To prove Thm. 2, we must simply combine Lemma 
1 with Lemma 2. To be precise. Lemma 1 states that oASIS 
will return a set of r linearly independent columns from G 
provided that A(i) > 0. Thus, as long as the algorithm does 
not terminate with A(i) = 0, this implies that we return a 
set of r linearly independent columns from G indexed by the 
index set S. Now, using Lemma 2, we have that when oASIS 
returns a subset of columns Gs such that the rank(Gs) = 
r, then exact recovery is guaranteed for X based upon the 
corresponding subset Xs. □ 

C. Exact Recovery from Unions of Independent Subspaces 

Guarantees for EES for SSC [11] and SSC-OMP [ 8 ] rely 
on the assumption that the dataset X, provides a complete 
reference set for each low-dimensional subspace present in 
the data (see Sec. II-D). To make this precise, assume that the 
points in X are drawn from a union of p subspaces, where 
U — and the dim(5i) = ki. We say that X provides a 

complete reference set for U if X. contains at least ki points 
from Si for all z = {1 ,... ,p}. 

It follows from the definition of exact matrix recovery that, 
if is a union of independent subspaces and dim(X) = 
J2i ki, then whenever X 5 yields exact matrix recovery we 
are guaranteed that X 5 also provides a complete reference set 
for U. This is the main condition required to prove EES in 
[ 8 ], [11]; thus, when SEED returns a subset of the data with 
at least k columns for each of its fc-dimensional subspaces, 
we can compute the corresponding covering radius of each 
subspace (see [ 8 ] for further details). Thus, as long as exact 
recovery occurs, we can apply the theory in [ 8 ], [ 11 ] to 
produce guarantees that EES occurs for the decomposition 
obtained via SEED. This result follows from combining Thm. 
1 in [ 8 ] with our condition for exact recovery in Thm. 2. 


V. Numerical Experiments 

In this Section, we first introduce the datasets used in 
our evaluations and then evaluate the performance of SEED 
for matrix approximation, clustering, denoising, and outlier 
detection. 

A. Datasets and Evaluation Setup 

We now describe the datasets used in our evaluations. 

• The Face dataset consists of images (each with 4032 
pixels) of ten subjects faces under various illumination 
conditions, resulting in a dataset of size 4032 x 631 [24]. 

• The hyperspectral (HS) dataset consists of 204 images 
from the Salinas scene, where each image contains spatial 
information about the scene at a different spectral band. 
Each image is 512x217 pixels and after removing pixels 
without labels, the total dataset is of size 204 x 54129. 
This dataset consists of 16 types of vegetation (classes) 
and the spectral signatures associated with each class are 
very low-dimensional (approximately 3-10 dimensions). 

• The MNIST dataset contains 50, 000 images of 10 hand¬ 
written digits (28 X 28 pixels), which results in a dataset 
of size 784 x 50A: [25]. 

• The Neuro dataset consists of the firing rates of 187 
neurons in motor area (Ml) collected at A = 875 time 
points, when a monkey is performing a center-out reach 
task moving from a center position to one of 8 targets 
[26]. The resulting dataset produces a data matrix of size 
187 X 875. 

• The union-of-subspaces UoS dataset is a synthetic dataset 
consisting of signals drawn from a union of two sub¬ 
spaces of dimension fc = 20 with a 3-dimensional overlap 
(three coordinates are fully shared by both subspaces). 
We then add a collection of outliers created by generating 
random Gaussian vectors. This results in a dataset of size 
N = 450 with Ni = 300 points in the first subspace, 
N 2 = 100 points in the second subspace, and No = 50 
outlier points. 

For our evaluations on the HS and MNIST datasets, we 
used an OpenMPI implementation of SEED written in C-H-. 
This parallelizes both oASIS for faster column selection and 
batch OMP for faster sparse representation computation. The 
experiments utilized 72 total processor cores with 4 GB of 
RAM per core. For the smaller datasets, we ran all of our 
evaluations in MATLAB with a single desktop processor. 

B. Matrix Approximation 

A well-studied application of CSS is the approximation of 
low rank matrices [4]. As SEED utilizes a fast sequential 
algorithm for column selection (oASIS), our approach pro¬ 
vides an effective strategy for matrix approximation. As we 
will show in Sec. IV-A, oASIS selects linearly independent 
columns at each step and thus obtains exact recovery (to 
machine precision) of rank r matrices using r samples (this 
is observed in practice in Fig. 3). In contrast, random and 
leverage-based sampling exhibit a significantly slower decay 
in their approximation error. 
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(a) (b) (c) (d) (e) 




Fig. 3. Approximation error versus size of factorization. The relative approximation error is displayed as a function of the factorization size (L/N) for 
SEED, en'or-based sampling (SES), random sampling for: (a) Faces , (b) Neuro, (c) MNIST, (d) HS, and (e) UoS. For the smaller datasets in (a,b,e), we also 
display the enor for leverage sampling, PCA, and SPCA. 


1) Results for Matrix Approximation: To evaluate the per¬ 
formance of SEED for matrix approximation, we compute the 
approximation error as a function of decomposition size. We 
compare the error when sampling the dataset with: (i) oASIS, 
(ii) sequential error selection (SES) [9], (iii) uniform random 
sampling, and when possible (iv) leverage score sampling [4]. 
In addition, we also compute the error for PCA and SPCA 
using the generalized power method in [27]. The relative 
approximation error is 


err(X,D) 


||X-DD+X|||, 

l|X||| 


Eigure 3 displays the approximation error for all hve 
datasets as a function the relative factorization size L/N. 
oASIS achieves exact matrix recovery for (b-e) when the 
number of points sampled equals the rank. This is in agreement 
with our result for matrix recovery in Thm. 2. This result 
suggests that our guarantee for exact matrix recovery is not 
overly restrictive and that SEED produces linearly independent 
sample sets in a wide range of real and synthetic datasets. 

Interestingly, we observe similar decay in the approximation 
error for both the Neuro and synthetic UoS datasets: (i) 
the error achieved for SEED, PCA, and SPCA are roughly 
equivalent (all three achieve exact recovery with the number of 
factors/samples equal to the rank), (ii) SES trails behind SEED 
but also quickly achieves exact recovery, and (iii) random and 
leverage sampling flatline and do not achieve exact recovery. 
This suggests that the Neuro dataset is likely to contain both 
low rank structures as well as outliers that make random 
sampling significantly less effective for matrix approximation 
than SEED. 


C. Sparse Representation-Based Learning 

Sparse representation-based approaches to classihcation [5] 
and clustering [6], leverage the fact that signals from the 
same class (or subspace) will use one another in their sparse 
representations. Using this fact, the sparsity patterns of a 
self-expression decomposition such as SSC can be used to 
cluster the data using either a spectral clustering or consensus 
method [28] (see Sec. II-D for more details). In fact, recent 
studies have shown that, when the dataset lives on a union of 
subspaces, the sparse representation of a point from a subspace 
will only consist of points from the same subspace [6], [10], 
[11], [21], [8]. 


Just as in SSC, we can use the decomposition provided 
by SEED to cluster the data using the sparsity patterns in 
V. However, because V is rectangular, standard spectral 
clustering approaches for square affinity matrices cannot be 
used. Rather, we can think of V as representing the edges of 
a bi-partite graph and thus co-clustering methods can be used 
in place of standard graph clustering methods. The spectral 
co-clustering algorithm introduced in [29] provides an elegant 
relaxation of the problem of finding a minimum cut through a 
bi-partite graph. An interesting consequence of using a spectral 
co-clustering approach is that, when we eventually solve an 
eigenvalue problem to hnd the minimum cut, we compute the 
second largest singular vector of a L x TV matrix rather than the 
second smallest eigenvector of a NxN matrix. This enables us 
to exploit simple iterative methods for leading singular vectors 
rather than computing the entire SVD for a x TV matrix. 

1) Results for Sparse Representation-Based Learning: In 
Eigure 4, we show a visualization of the embedding of a 
union of five overlapping subspaces; we show the first three 
coordinates of the embedding and plot the projection of the 
unsampled signals as red dots and the projection of sampled 
signals as blue stars. This result provides evidence that: (i) 
co-clustering provides a feasible and efficient strategy for 
clustering the data with SEED and (ii) our proposed sampling 
strategy is capable of separating the data with far fewer 
samples than random sampling. 

To quantify the performance of SEED for sparse 
representation-based clustering, we compute the cost of a 
normalized cut as we vary the size of the factorization L. 
The cost of a normalized cut is a measure of how easy it 
is to cluster a bi-partite graph into its correct classes. This 
cost is dehned as follows [29]: Let Rk index the rows of V 
corresponding to points in the If^ class, Ck index the set of 
columns corresponding to the points in the class, and Ll 
index all N points in the dataset. The cost of the normalized 
cut for the kf^ class is 

nnitiVi _ rtfc1 ‘^dI 

J2ien,jeCk 

In our subsequent experiments, we compare the average cost 
of a normalized cut for column sampling-based approaches 
with SSC-OMP and the NN graph. We note that both SSC- 
OMP and NN exhibit 0{N"^) complexity, which make them 
impractical for large datasets. 
































Random Sampling SEED 



Fig. 4. Visualization of co-clustering with SEED. Visualization of the 
embedding of V for a union of five overlapping 20-dimensional subspaces 
in Pairs of subspaces have at most a 10-dimensional intersection and 

the rank of the dataset is r = 150. On the left, we show the embedding for 
L = {100, 200, 300} samples selected via random sampling and on the right, 
we show the embedding for SEED. To aid in visualization, we draw ellipses 
around samples from each subspace (each cluster) and display the sampled 
points as blue (*) and unsampled points as red (o). 


For the Neuro dataset, we achieve lower cut ratios than SSC 
(0.1755) and NN (0.1613) for all of the column sampling- 
based approaches after sampling 20% of the samples. With 
30% of the samples, the column sampling-based methods 
achieve cut ratios of 0.1085 (SEED), 0.1342 (SES), 0.1512 
(Lev), 0.1412 (Rand). Eor these experiments, we set fcmax = 5 
and e = 0.7. Eor higher values of fcmax and lower values 
of e, we observe that SSC-OMP provides the best cut ratios. 
Our results suggest that, while leverage and random sampling 
provide poor schemes for matrix approximation (as evidenced 
by Eig. 3), all of the column sampling approaches provide 
comparable performance in terms of their cut ratios. 

In Eig. 5, we display the normalized cut ratios for the Eaces 
dataset for six (a) and twenty (b) different subjects. We observe 
similar decay in the cut ratios for both datasets, where SEED 
and SES achieve normalized cuts less than NN and SSC-OMP 
when we sample 35% of the dataset. The gap between SSC- 
OMP and SEED grows as we increase the number of samples 
to 50%. The performance of leverage and random sampling 
appears to flatline just above the cut ratio for SSC and NN 
methods. 

In many of the datasets that we tested, we observe that 
subsampling-based approaches can produce smaller cut ratios 
than SSC-OMP and NN methods. This is likely due to the 
fact that, as the size of the ensemble grows relative to the 
dimension of the underlying cluster, the graphs generated by 
SSC and NN become weakly connected and thus produce 
smaller cut ratios [30], [31]. In contrast, the sparse repre¬ 
sentations produced by SEED are built upon self-expressive 
bases containing incoherent columns, and thus we observe that 
SEED produces sparse graphs that are more well connected 
(within cluster) and thus produce smaller cut ratios (Sec. V). 




► ►► 



— Gram Random 
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. SSC 0 SES 

NN > SEED 

»(.. . 
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Size of Factorization (lyN) Size of Factorization (LTN) 

Fig. 5. Normalized cut ratios for face image database. Normalized cut 
ratios vs. size of factorization for collections of (a) six subject’s faces under 
sixty different illumination conditions (M = 4032, N = 360) and (b) twenty 
subject’s faces under sixty different illumination conditions (M = 4032, N = 
1200). In both cases, the data is full rank, i.e., r = 360 and r = 1200 for 
(a) and (b), respectively. 

D. Denoising 

SEED computes a sparse approximation of each column 
of X in terms of a small number of columns from the same 
dataset. By constraining the sparsity level of each column of 
V in OMP (solving Sparse), we obtain an approximation to 
the column as = Xsv^. This approximation scheme 
provides a denoised version of the original dataset that is 
similar in spirit to NN-based denoising. However, rather than 
finding the k nearest neighbors and applying a simple aver¬ 
aging procedure (NN-denoising), SEED finds both an optimal 
set of “neighbors” and weights to use at each datapoint. As 
a motivating example (Eig. 6), we show the performance of 
SEED for clustering hyperspectral image data after denoising 
the data with (d) SEED, (c) a random subset of samples, 
and after (b) applying fc-means to the original data. Here, we 
observe a significant improvement in clustering after denoising 
the data with only L = 30 samples: the fc-means clustering 
error before and after denoising the data with SEED is 31% 
and 0.68% respectively. 

1) Results for Denoising: Due to the noisy nature of hyper¬ 
spectral data, we And that SEED is a highly effective strategy 
for denoising such datasets. To do this, we apply SEED to 
compute the approximation X = DV, where fcmax = 5 and 
the error tolerance for OMP to e = 0.2. After denoising the 
data, we then apply a simple fc-means clustering algorithm 
to the columns in X. In Eig. 6, we display the results from 
clustering a subset of the image with only L = 30 points 
selected from the entire dataset (N = 54129). With only a 
small sample of points, we observe nearly perfect clustering 
of the image with SEED. We compare the performance of 
SEED with a random sampling-based approach (identical 
to the setup for SEED except the sample sets are selected 
randomly) and clustering the original data without denoising. 
The clustering error is 31% (fc-means applied to original 
data), 15% (denoising with random samples from dataset), 
and 0.68% (SEED). While we do not show the clustering 
results obtained by denoising the data with PCA, the clustering 
error for PCA-based denoising based upon L = 30 principal 
components is 62% (significantly higher than clustering the 
raw data). 
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(a) (b) (c) (d) 



Fig. 6. Clustering hyperspectral images (HSI). We display the results of 
clustering a section of the HSI data: (a) ground truth (b) and fc-means applied 
to the original data, (c) denoised data from a random selection of L = 30 
columns, and (d) denoised data obtained via SEED for L = 30 columns. The 
clustering eiTor is 31% (b), 15% (c), and 0.68% (d). 


E. Outlier Detection 

When the dataset lies on a single or multiple low¬ 
dimensional subspaces, we can use SEED to discover outliers. 
The idea behind using SEED for outlier detection is that we try 
to sparely represent a data point that lies in a low-dimensional 
subspace, a small number of data points are required (i.e., 
the representation is sparse). In contrast, when we try to 
sparsely represent an outlier, a large number of data points 
are needed (i.e., the representation is dense). Eor instance, 
when a collection of signals lie on a union of fc-dimensional 
subspaces, the sparsity of each signal in a fc-dimensional 
subspace is bounded by fc. We can exploit this rank revealing 
property of SEED to determine whether a signal lies on one 
of the low-dimensional subspaces in the ensemble or whether 
it is an outlier. 

To find outliers in the dataset, we form a self-expressive 
basis and then form the decomposition X = DV, where 
V = [Vs V.sj. Rather than setting Vs to a diagonal matrix as 
in Alg. 2, the sparse coefficients Vs are obtained by solving 
the SSC objective in (1). To compute both Vs and V_s, 
we utilize batch OMP to solve (Error) by providing an error 
tolerance e to the algorithm. Constraining the error (rather 
than the sparsity) is important, because our goal is to use the 
sparsity level of each column to determine whether it is an 
outlier. Once we compute a sparse factorization, we compute 
the number of nonzeros in each column of V (sparsity level) 
and segment the columns of X based upon a user set threshold. 
When a column of V is dense we declare it an outlier 
and when V is sufficiently sparse, we declare it an inlier. 
In some cases, setting a threshold to segment the data can 
be straightforward (when the sparsity levels admit a hi- or 
multi-modal distribution). However, in cases where setting the 
threshold is difficult, one can use fc-means to learn a threshold 
to split the data. 

1) Results for Outlier Detection: In Eig. 7, we demonstrate 
the rank revealing property of SEED when applied to the 
UoS dataset that has been corrupted with outliers. Along the 
bottom, we show the sparse coefficient matrices obtained for 
(a) SEED {L = 160), (b) Random sampling (L = 160), and 
(c) SPCA (L = 60). Above these coefficient matrices, we 
show the number of nonzeros (sparsity) of each column. In 
the case of SPCA, we set L = 60 because, as we increase L, 
we observe an even smaller gap in the sparsity level between 
signals living in low-dimensional subspaces and the outliers. 
Eor all of the methods, we compute the sparse coefficients 



(b) 



(c) 



Eig. 7. Demonstration of rank revealing property of SEED. The sparsity 
level (top row) and sparse coefficient matrices (bottom row) for SEED with 
(a) SEED (L = 160), (b) Random sampling (L = 160), and (c) SPCA 
{L = 60). 


using OMP, where we set e = 0.3 and do not constrain the 
maximum sparsity level. In this case, we can clearly separate 
inliers from outliers: the sparsity level of inliers and outliers 
is around fc = 2 and fc = 10, respectively. 

Our results suggest that SEED can provide a strategy for 
outlier detection by simply thresholding columns based upon 
their sparsity level. In general, determining an appropriate 
threshold to segment low rank structures from outliers can 
be challenging. However, in practice, we observe that the 
distribution of column sparsity is multi-modal; thus, instead 
of setting the threshold explicitly, a fc-means algorithm can be 
employed to find a good threshold to segment outliers. 

VI. Conclusions 

This paper introduced SEED, a scalable method for sparse 
matrix factorization that couples a new and provable method 
for column selection (oASIS) with greedy sparse recovery 
(OMP). We have demonstrated how SEED can be applied to 
either assist or solve numerous signal processing and machine 
learning problems, ranging from matrix approximation and 
denoising, to clustering and outlier detection. In addition, we 
have developed a sufficient condition for SEED to achieve 
exact matrix recovery when we sample the same number of 
columns as the rank of the dataset (Thm. 2). In numerical 
experiments, we have shown that this result holds for a number 
of real-world datasets, i.e., we obtain exact recovery after 
sampling a number of columns equal to the matrix rank. This 
is in stark contrast to random sampling, where exact recovery 
cannot be guaranteed. 

Column sampling has been explored extensively in the ma¬ 
chine learning literature for the task of approximating low rank 
matrices [22], [15], [9]. In this paper, we applied column selec¬ 
tion to a new class of problems, namely sparse representation- 
based clustering/classification and subspace clustering. Thus, 
an important contribution of our work is showing how self- 
expressive approaches used in signal processing and computer 
vision can benefit from column selection approaches. 

We have demonstrated that SEED provides self-expressive 
bases amenable to solving sparse representation-based learning 
and subspace clustering. In the case where the dataset lies 
on a union of independent subspaces, we have shown that 
our condition for exact recovery (Thm. 2) also implies that 
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Algorithm 3 : Accelerated Sequential Incoherent Selection 
(oASIS) 


Inputs: The data matrix X € a TV x 1 vector d = 

diag(G), the maximum number of columns to sample L, 
the number of columns to sample initially k < L, and a 
non-negative stopping criterion 6. 

Initialize: Choose a vector S G [1,-/V]*^ of k random 
starting indices. Set C = , W”! = (X^Xs)”\ 

d = diag(X^Xs) and R = W-^C'^. 
while fc < L do 

A d — colsum(C o R) 
i G- arg max | A | 

j^S 

if |Ai| < S then 
return 
end if 


b ^— X^^x^ , s ^— l/A^ , q ^— R-ii c ^— X^x^ 
C ^ [C,c] 


w 




w 


-1 


sqq , -sq 

T 


R^ 


-sq", s 

R + sq(q^C’^ — c"^) 
s(—q^C^ + c^) 

k <— k + 1 
S^SU{i} 

end while 


we select at least k linearly independent columns from each 
fc-dimensional subspace in the dataset. However, providing 
a bound on the covering radius (how well each subspace is 
sampled) with column sampling methods is an open problem 
that must be solved to prove stronger results for feature 
selection similar to those in [10], [11], [8] for SSC. Extending 
our analysis to the case of approximately low rank matrices, 
unions of overlapping subspaces, and noisy settings are all 
interesting directions for future work. 


Appendix A 

Accelerated Sequential Incoherent Selection 
(OASIS) 

We now provide a detailed description of our implementa¬ 
tion of oASIS for column sampling. Pseudocode is provided 
in Alg. 3. 


A. Accelerated Sampling 

A naive implementation of the column sampling approach 
described in Sec. III-Al is inefficient, because each step 
requires a matrix inversion to form in addition to calcu¬ 

lating the errors A^. Fortunately, this can be done efficiently 
by updating the results from the previous step using block 
matrix inversion formulas and rank-1 updates. We now provide 
a derivation of the algorithm and pseudocode in Alg. 3. 

We first consider the calculation of after column 

fc -f 1 is added to the approximation. Let denote the first 
k rows of column of the k + 1**' column of G and dfc denote 


its diagonal. Using a block inversion formula, we obtain 
W^^i=[Wfc bfc+ib^+i dfc+i]-' (3) 

_ T ®fe-i-iqfe-i-iqfc+i) —Sfc+iqfe+i 

where Sfc+i = (d^+i - = A^^^ is the 

(scalar valued) Schur complement and qfc+i = W^^b^+i 
is a column vector. This update formula allows to 

be formed by updating and only requires inexpensive 

vector-vector multiplication. Note that is invertible as 

long as Afe+i (the Schur complement) is non-zero, which is 
guaranteed by our sampling rule; the algorithm terminates if 
Afc+i = 0 in which case our approximation is exact. 

We now consider the calculation of A^ = d^ — bfW^b^ 
for all i. Note that on step k of the method, we have 
Cl = [bi,b2,--- ,bAr]. We can evaluate all values of 

bf W|.bi simultaneously by computing the entry-wise product 
of Cfc with the matrix R^ = W^^Cl and then summing the 
resulting columns. If we have already formed and R^ on 
iteration fc, then the matrix R^+i = '^k+i^l+i needed on 
the next iteration is obtained by applying Eqn. 3 to to 

obtain 

_ Rfc -f Sk+iqb+Ml+iCl - cl^J 
-[ s,+i(-qf+iC^ + c^+i) J- 

The update formula above forms R^+i by updating the matrix 
Rfc from the previous iteration. The update requires only 
matrix-vector and vector-vector products. The application of 
this fast update rule to perform incoherent sampling yields 
Alg. 3. We use this accelerated version of oASIS in all of our 
numerical expeditions. 
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